Introduction to harp

Andrew Singleton

A framewok for meteorological data analysis using R

Component packages

harpCore
harpIO
harpPoint
harpSpatial
harpVis
harp

Component packages

harpCore
harpIO
harpPoint
harpSpatial
harpVis
harp

Dependency Tree

harpCore
harpIO
harpPoint
harpSpatial
harpVis
harp

Dependency Tree

harpCore
harpIO
harpPoint
harpSpatial
harpVis
harp

Dependency Tree

harpCore
harpIO
harpPoint
harpSpatial
harpVis
harp

Installation

Basic installation

install.packages("remotes")

Basic installation

install.packages("remotes")
remotes::install_github("harphub/harp")

Basic installation

install.packages("remotes")
remotes::install_github("harphub/harp")

To avoid throttling, get a Personal Access Token

Basic installation

install.packages("remotes")
remotes::install_github("harphub/harp")

To avoid throttling, get a Personal Access Token

To speed up, use Posit Package Manager

Dependencies - meteogrid

  • Projections and coordinate transformations
  • Requires libproj-dev

Dependencies - grib

  • Read grib files
  • Requires libeccodes-dev
  • Requires Rgrib2
library(remotes)
install_github("harphub/Rgrib2")

Dependencies - FA

  • Read FA files
  • Requires Rfa
library(remotes)
install_github("harphub/Rfa")

Dependencies - NetCDF

  • Read NetCDF files
  • Requires libnetcdf-dev
  • Requires ncdf4
install.packages("ncdf4")

RStudio

  • IDE for R
    • Also good for Python, JS, CSS, md & more
  • Use JupyterHub at ECMWF
    • X11 Desktop
    • Terminal
module load R
module load rstudio
rstudio &

Reading data

Single files

  • read_grid("<file_name>", "<parameter>", ...)

  • For a single field, returns a geofield

  • For multiple fields, returns a geolist

  • With data_frame = TRUE, returns a harp_df data frame

  • Doesn’t know much about the data, forecast? analysis?

Single files

library(harp)
read_grid(
  "data/MEPS/2025/11/01/00/member_00/meps_sfc_01_20251101T00Z.nc",
  "T2m"
)
 : T2m K 
Time:
 2025/11/01 01:00
Domain summary:
949 x 1069 domain
Projection summary:
proj= lcc 
NE = ( 54.24126 , 71.57601 )
SW = ( 0.2782807 , 50.31962 )
Data summary:
255.4345 275.374 278.4716 278.4594 281.4023 289.2275 

Single files

read_grid(
  "data/MEPS/2025/11/01/00/member_00/meps_sfc_01_20251101T00Z.nc",
  c("T2m", "RH2m")
)
<harp_geolist[2]>
[[1]] <numeric geofield [949 x 1069] Min = 255.435 Max = 289.228 Mean = 278.459>
[[2]] <numeric geofield [949 x 1069] Min = 0.429 Max = 1.000 Mean = 0.876>

Single files

read_grid(
  "data/MEPS/2025/11/01/00/member_00/meps_sfc_01_20251101T00Z.nc",
  c("T2m", "RH2m"), 
  data_frame = TRUE
)
# A tibble: 2 × 8
  fcst_dttm           valid_dttm          lead_time parameter level_type level
* <dttm>              <dttm>                  <dbl> <chr>     <chr>      <dbl>
1 2025-11-01 01:00:00 2025-11-01 01:00:00         0 T2m       height         2
2 2025-11-01 01:00:00 2025-11-01 01:00:00         0 RH2m      height         2
  units  gridded_data
* <chr>     <geolist>
1 K     [949 × 1,069]
2 1     [949 × 1,069]

Multiple files

  • read_forecast()
  • read_analysis()
  • read_obs()
  • Uses templating system for file names
meps_tmplt <- "{YYYY}/{MM}/{DD}/{HH}/member_{MBR2}/meps_sfc_{LDT2}_{YYYY}{MM}{DD}T{HH}Z.nc"
meps_path <- "data/MEPS"

Multiple files

ens <- read_forecast(
  seq_dttm(2025112600, 2025112700, "1d"),
  "meps",
  "T2m",
  lead_time     = c(0, 3),
  members       = c(0, 1, 2, 9, 12),
  file_path     = meps_path,
  file_template = meps_tmplt,
  return_data   = TRUE 
)
::ensemble gridded forecast:: # A tibble: 4 × 14
  fcst_model fcst_dttm           lead_time parameter valid_dttm         
  <chr>      <dttm>                  <dbl> <chr>     <dttm>             
1 meps       2025-11-26 00:00:00         0 T2m       2025-11-26 00:00:00
2 meps       2025-11-26 00:00:00         3 T2m       2025-11-26 03:00:00
3 meps       2025-11-27 00:00:00         0 T2m       2025-11-27 00:00:00
4 meps       2025-11-27 00:00:00         3 T2m       2025-11-27 03:00:00
  level_type level units fcst_cycle   meps_mbr000   meps_mbr001   meps_mbr002
  <chr>      <dbl> <chr> <chr>          <geolist>     <geolist>     <geolist>
1 height         2 K     00         [949 × 1,069] [949 × 1,069] [949 × 1,069]
2 height         2 K     00         [949 × 1,069] [949 × 1,069] [949 × 1,069]
3 height         2 K     00         [949 × 1,069] [949 × 1,069] [949 × 1,069]
4 height         2 K     00         [949 × 1,069] [949 × 1,069] [949 × 1,069]
    meps_mbr009   meps_mbr012
      <geolist>     <geolist>
1 [949 × 1,069] [949 × 1,069]
2 [949 × 1,069] [949 × 1,069]
3 [949 × 1,069] [949 × 1,069]
4 [949 × 1,069] [949 × 1,069]

Multiple files

anl <- read_analysis(
  seq_dttm(2025112610, 2025112614, "1h"),
  "met_anl",
  "precipitation_amount",
  file_path     = "data/yr_short",
  file_template = "{YYYY}/{MM}/{DD}/met_analysis_1_0km_nordic_{YYYY}{MM}{DD}T{HH}Z.nc",
  file_format_opts = netcdf_opts(proj4_var = "projection_lcc")
)
::gridded analysis:: # A tibble: 5 × 7
  anl_model parameter            valid_dttm          level_type level units 
* <chr>     <chr>                <dttm>              <chr>      <dbl> <chr> 
1 met_anl   precipitation_amount 2025-11-26 10:00:00 unknown       NA kg/m^2
2 met_anl   precipitation_amount 2025-11-26 11:00:00 unknown       NA kg/m^2
3 met_anl   precipitation_amount 2025-11-26 12:00:00 unknown       NA kg/m^2
4 met_anl   precipitation_amount 2025-11-26 13:00:00 unknown       NA kg/m^2
5 met_anl   precipitation_amount 2025-11-26 14:00:00 unknown       NA kg/m^2
              anl
*       <geolist>
1 [1,796 × 2,321]
2 [1,796 × 2,321]
3 [1,796 × 2,321]
4 [1,796 × 2,321]
5 [1,796 × 2,321]

Plotting fields

Plot method

  • Uses ggplot() in the background
  • ggplot is a powerful plotting package for R
    • enables you to affect all aspects of the plot
  • Automatically facets by the valid_dttm column
  • Requires user to specify which column plot if more than 1 <geolist> column.

Precipitation analysis

plot(anl)

Precipitation analysis

Using colour scale for precipitation

plot(anl) + 
  scale_fill_precip()

Precipitation analysis

Using binned colour scale for precipitation

plot(anl) + 
  scale_fill_precip_b()

Ensembles

Need to get all of members in a single column

mbrs <- pivot_members(ens)

And filter to only those rows we want to plot

library(dplyr)
mbrs <- filter(
  mbrs, 
  lead_time == 0,
  fcst_dttm == as_dttm(2025112600)
)

Ensembles

Now we need to specify the column to facet by

plot(mbrs, facet_col = member)

Parameter Definitions

harp_params

  • Built in definitions for different file formats
show_param_defs()
# A tibble: 61 × 2
   name          description                                                    
   <chr>         <chr>                                                          
 1 accpcp12h     12-hour accumulated precipitation                              
 2 accpcp1h      1-hour accumulated precipitation                               
 3 accpcp24h     24-hour accumulated precipitation                              
 4 accpcp3h      3-hour accumulated precipitation                               
 5 accpcp6h      6-hour accumulated precipitation                               
 6 caf           Cloud area fraction at vertical levels                         
 7 cape          Convective available potential energy                          
 8 cbase         Height of cloud base                                           
 9 cc_below_7500 Cloud cover below 7500m                                        
10 cchigh        High level cloud cover                                         
11 cclow         Low level cloud cover                                          
12 ccmed         Medium level cloud cover                                       
13 cctot         Total integrated cloud cover                                   
14 cin           Convective inhibition                                          
15 d             Wind direction                                                 
16 d10m          Wind direction at 10m above the ground                         
17 g             Wind gust                                                      
18 g10m          Wind gust at 10m above the ground                              
19 gh            Geopotential height                                            
20 gmax          Maximum wind gust at 10m above the ground                      
21 lsm           land-sea mask                                                  
22 pcp           Accumulated precipitaion                                       
23 pmsl          Air pressure at mean sea level                                 
24 pressure      Atmospheric air pressure                                       
25 psfc          Surface air pressure                                           
26 q             Specific humidity of air                                       
27 q2m           Specific humidity of air at 2m above the ground                
28 rh            Relative humidity of air                                       
29 rh2m          Relative humidity of air at 2m above the ground                
30 s             Wind speed                                                     
31 s10m          Wind speed at 10m above the ground                             
32 sea_ice       Sea ice concentration                                          
33 sfc_geo       Surface geopotential                                           
34 smax          Maximum wind speed at 10m above the ground                     
35 snow          Snow depth                                                     
36 sst           Sea surface temperature                                        
37 t             Air temperature                                                
38 t0m           Skin temperature                                               
39 t2m           Air temperature at 2m above the ground                         
40 td            Dew point temperature                                          
41 td2m          Dew point temperature at 2m above the ground                   
42 tmax          Maximum air temperature at 2m above the ground                 
43 tmin          Minimum air temperature at 2m above the ground                 
44 topo          Height of topography above sea level                           
45 u             Wind speed in u direction                                      
46 u10m          Wind speed in u direction at 10m above the ground              
47 ugust         Wind gust in U direction                                       
48 ugust10m      Wind gust at 10m above the ground in U direction               
49 v             Wind speed in v direction                                      
50 v10m          Wind speed in v direction at 10m above the ground              
51 vgust         Wind gust in V direction                                       
52 vgust10m      Wind gust at 10m above the ground in V direction               
53 vis           Visibility in air                                              
54 w             Vertical (upward) wind speed                                   
55 wd            Wind direction calculated from U and V winds                   
56 wd10m         Wind direction at 10m above the ground calculated from U and V…
57 wg            Wind gust calculated from U and V gusts                        
58 wg10m         Wind gust at 10m above the ground calculated from U gust and V…
59 ws            Wind speed calculated from U and V winds                       
60 ws10m         Wind speed at 10m above the ground calculated from U and V win…
61 z             Geopotential                                                   

harp_params

show_param_defs("grib")
# A tibble: 50 × 2
   name     grib_name                         
   <chr>    <chr>                             
 1 caf      tcc                               
 2 cape     cape                              
 3 cchigh   hcc                               
 4 cclow    lcc                               
 5 ccmed    mcc                               
 6 cctot    tcc                               
 7 d        wdir                              
 8 d10m     wdir                              
 9 g        fg                                
10 g10m     fg                                
11 gh       gh                                
12 lsm      lsm                               
13 pcp      tp                                
14 pmsl     msl                               
15 pressure pres                              
16 psfc     pres                              
17 q        q                                 
18 q2m      c(2q, q)                          
19 rh       r                                 
20 rh2m     c(2r, r)                          
21 s        ws                                
22 s10m     c(10si, ws)                       
23 sea_ice  icec                              
24 sfc_geo  z                                 
25 sst      sst                               
26 t        t                                 
27 t0m      c(0t, t)                          
28 t2m      c(2t, t)                          
29 td       td                                
30 td2m     td                                
31 tmax     c(mx2t, mxt)                      
32 tmin     c(mn2t, mnt)                      
33 topo     orog                              
34 u        u                                 
35 u10m     c(10u, u)                         
36 ugust    ugst                              
37 ugust10m ugst                              
38 v        v                                 
39 v10m     c(10v, v)                         
40 vgust    vgst                              
41 vgust10m vgst                              
42 vis      vis                               
43 w        w                                 
44 wd       list(u = u, v = v)                
45 wd10m    list(u = c(10u, u), v = c(10v, v))
46 wg       list(u = ugst, v = vgst)          
47 wg10m    list(u = ugst, v = vgst)          
48 ws       list(u = u, v = v)                
49 ws10m    list(u = c(10u, u), v = c(10v, v))
50 z        z                                 

Applying a function

harp_params$ws10m
$description
[1] "Wind speed at 10m above the ground calculated from U and V winds at 10m above the ground"

$min
[1] 0

$max
[1] 100

$grib
$grib$name
$grib$name$u
[1] "10u" "u"  

$grib$name$v
[1] "10v" "v"  


$grib$level_type
[1] "heightAboveGround" "surface"          

$grib$level
[1] 10


$netcdf
$netcdf$name
$netcdf$name$u
[1] "x_wind_10m"

$netcdf$name$v
[1] "y_wind_10m"



$wrf
$wrf$name
$wrf$name$u
[1] "U10"

$wrf$name$v
[1] "V10"



$fa
$fa$name
$fa$name$u
[1] "CLSVENT.ZONAL   "

$fa$name$v
[1] "CLSVENT.MERIDIEN"


$fa$units
[1] "m/s"


$func
function (u, v) 
{
    res <- sqrt(u^2 + v^2)
    attr(res, "info")[["name"]] <- "Wind speed"
    res
}
<bytecode: 0x55c7723bfc10>
<environment: 0x55c772395960>

Modifying and adding parameters

  • modify_param_def()
  • add_param_def()
my_params = modify_param_def(
  "pcp",
  netcdf = new_netcdf_param("precipitation_amount")
)

read_grid(
  "data/yr_short/2025/11/27/met_analysis_1_0km_nordic_20251127T12Z.nc",
  "pcp",
  param_defs = my_params,
  file_format_opts = netcdf_opts(proj4_var = "projection_lcc")
)
 : pcp kg/m^2 
Time:
 2025/11/27 12:00
Domain summary:
1796 x 2321 domain
Projection summary:
proj= lcc 
NE = ( 41.76428 , 72.18485 )
SW = ( 1.918465 , 52.30272 )
Data summary:
0 0 0 0.1722853 0.1437523 7.442674 

Geographic Transformations

After reading

  • geo_regrid() - Interpolate between grids
  • geo_points() - Interpolate to point locations
  • geo_upscale() - Upscale to lower resolution
  • geo_zoom() - Zoom in on a centre point
  • geo_subgrid() - Extract a subgrid based on (i,j) locations
  • geo_xsection() - Extract a cross section between 2 points
  • geo_reproject() - Reproject point data

Regridding example

geo_regrid(anl, get_domain(ens$meps_mbr000)) |> 
  plot()

At read time

  • Use transformation = "<type>" with transformation_opts
  • Interpolation to points is transformation = "interpolate"
    • Uses built in station_list if nothing is specified
read_grid(
  "data/MEPS/2025/11/27/00/member_00/meps_sfc_12_20251127T00Z.nc",
  "ws10m",
  transformation = "interpolate"
)
# A tibble: 1,391 × 11
   fcdate              validdate           lead_time parameter level_type level
   <dttm>              <dttm>                  <dbl> <chr>     <chr>      <dbl>
 1 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
 2 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
 3 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
 4 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
 5 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
 6 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
 7 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
 8 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
 9 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
10 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
   units   SID   lat   lon station_data
   <chr> <int> <dbl> <dbl>        <dbl>
 1 m/s    1001  70.9 -8.67        10.8 
 2 m/s    1010  69.3 16.1         12.8 
 3 m/s    1014  69.2 17.9          4.31
 4 m/s    1015  69.6 17.8         11.5 
 5 m/s    1018  69.2 16.0         10.1 
 6 m/s    1023  69.1 18.5          2.38
 7 m/s    1025  69.7 18.9         11.4 
 8 m/s    1026  69.7 18.9         12.9 
 9 m/s    1027  69.7 18.9         12.9 
10 m/s    1028  74.5 19.0          9.46
# ℹ 1,381 more rows

Specify locations and method

my_stations <- tribble(
  ~SID, ~lat, ~lon, ~name,
  1, 60.169832654, 24.938162914, "HELSINKI STATION",
  2, 60.203844,   24.960841,     "FMI"
)
read_grid(
  "data/MEPS/2025/11/27/00/member_00/meps_sfc_12_20251127T00Z.nc",
  "ws10m",
  transformation = "interpolate",
  transformation_opts = interpolate_opts(my_stations, method = "bilinear")
)
# A tibble: 2 × 11
  fcdate              validdate           lead_time parameter level_type level
  <dttm>              <dttm>                  <dbl> <chr>     <chr>      <dbl>
1 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
2 2025-11-27 12:00:00 2025-11-27 12:00:00         0 ws10m     height        10
  units   SID   lat   lon station_data
  <chr> <dbl> <dbl> <dbl>        <dbl>
1 m/s       1  60.2  24.9         5.52
2 m/s       2  60.2  25.0         4.85

Cross sections

  • transformation = "xsection"
  • transformation_opts = xsection_opts(a = ..., b = ...)
oslo <- c(10.72, 59.9422)
helsinki <- c(24.960841, 60.203844)

xs <- read_grid(
  "data/MEPS/2025/11/27/meps_det_2_5km_20251127T00Z.nc",
  "caf", 
  lead_time           = 12,
  transformation      = "xsection",
  transformation_opts = xsection_opts(oslo, helsinki),
  vertical_coordinate = "model",
  data_frame          = TRUE
)

Plotting cross sections - model levels

  • No specialized plot method
  • Use ggplot() directly

Plotting cross sections - model levels

ggplot(xs, aes(xs = xsection_data)) + 
  geom_xs() + 
  scale_y_reverse()

Plotting cross sections - pressure levels

  • Requires surface pressure to compute the pressure levels
  • By default assumes harmonie 65 levels
xs <- read_grid(
  "data/MEPS/2025/11/27/meps_det_2_5km_20251127T00Z.nc",
  c("caf", "psfc"), 
  lead_time           = 12,
  transformation      = "xsection",
  transformation_opts = xsection_opts(oslo, helsinki),
  vertical_coordinate = "model",
  data_frame          = TRUE
)

And each parameter needs to be in its own column

xs <- pivot_parameters_wider(xs, xsection_data)

Plotting cross sections - pressure levels

ggplot(xs, aes(xs = caf, psfc = psfc)) + 
  geom_xs_pressure(interpolate = TRUE) + 
  scale_fill_viridis_c(option = "F") +
  scale_y_reverse() + 
  coord_cartesian(expand = FALSE)

Plotting cross sections - pressure levels

ggplot(xs, aes(xs = caf, psfc = psfc)) + 
  geom_xs_pressure(
    interpolate = TRUE,
    scale_pressure = 1 / 100,
    scale_distance = 1 / 1000, 
    topo_colour    = "grey",
    topo_linewidth = 0.5
  ) + 
  scale_fill_viridis_c("Cloud\nFraction", option = "F") +
  scale_y_reverse() + 
  coord_cartesian(expand = FALSE) + 
  labs(
    x = "Oslo -> Helsinki [km]",
    y = "Pressure [hPa]",
    title = "Cloud fraction",
    subtitle = "for a cross section between Oslo and Helsinki"
  )

Plotting cross sections - pressure levels

Plotting cross sections - height levels

  • Requires surface pressure, temperature, specific humidity and surface elevation
xs <- read_grid(
  "data/MEPS/2025/11/27/meps_det_2_5km_20251127T00Z.nc",
  c("caf", "psfc", "sfc_geo", "T", "Q"), 
  lead_time           = 12,
  transformation      = "xsection",
  transformation_opts = xsection_opts(oslo, helsinki),
  vertical_coordinate = "model",
  data_frame          = TRUE
)

And each parameter needs to be in its own column

xs <- pivot_parameters_wider(xs, xsection_data)

Plotting cross sections - height levels

ggplot(xs, aes(xs = caf, psfc = psfc, temp = T, spec_hum = Q, topo = sfc_geo)) + 
  geom_xs_height(interpolate = TRUE) + 
  scale_fill_viridis_c(option = "F") +
  coord_cartesian(expand = FALSE)

Plotting cross sections - height levels

ggplot(xs, aes(xs = caf, psfc = psfc, temp = T, spec_hum = Q, topo = sfc_geo)) + 
  geom_xs_height(
    interpolate = TRUE,
    height_lims    = c(NA, 14000),  
    scale_distance = 1 / 1000, 
    topo_colour    = "grey",
    topo_linewidth = 0.5
  ) + 
  scale_fill_viridis_c("Cloud\nFraction", option = "F") +
  coord_cartesian(expand = FALSE) + 
  labs(
    x = "Oslo -> Helsinki [km]",
    y = "Pressure [hPa]",
    title = "Cloud fraction",
    subtitle = "for a cross section between Oslo and Helsinki"
  )

Plotting cross sections - height levels

Cross section map

  • Draws a line on a map showing the cross section location
  • Often needs some experimentation to get text labels right
ggplot(xs, aes(xs = psfc)) + 
  geom_xs_map(
     map_fill      = "grey",
    section_labels = c("Oslo", "Helsinki"),
    section_label_hjust = 0.5, 
    section_label_vjust = -0.5
  ) +
  theme_harp_map() + 
  coord_equal(expand = FALSE) + 
  theme(panel.background = element_rect(fill = "skyblue"))

Cross section map

Point Verification

Preparing forecasts

  • Use read_forecast(..., transformation = "interpolate")
  • Set parameter = NULL for vfld files
  • Set output_file_opts = fctable_opts(path)
    • Outputs as SQLite, one file per parameter, per month, per cycle

Preparing forecasts

  • Set output_file_opts = fcparquet_opts(path)
    • Outputs dataset of parquet files
    • Uses “hive partitioning”
    • One partition per parameter and model run
    • Experimental
    • Might be slow on Atos due to file system

Preparing observations

  • Use read_obs(...)
  • Only works for vobs and obsoul files
    • Local readers: FROST at MET Norway
  • Set output_file_opts = obstable_opts(path)
    • Outputs as SQLite, one file per year

Preparing observations

  • Set output_file_opts = obsparquet_opts(path)
    • Outputs dataset of parquet files
    • Uses “hive partitioning”
    • One partition day
    • Experimental
    • Might be slow on Atos due to file system

Read point forecasts

  • fcst <- read_point_forecast(...)
  • Need to set date-times, model (experiment) name(s), parameter & path
  • Defaults to lead times 0 - 48h, by 3h
  • Can set which station IDs to read
  • Automatic decumulation of accumumlated parameters

Read point obseravations

  • obs <- read_point_obs(...)
  • Can get date-times and stations from forecast
    • unique_valid_dttm(fcst)
    • unique_stations(fcst)
  • Can set gross error bounds
    • min_allowed
    • max_allowed

Get ready to verify

  • If more than one model, select only the common cases
    • common_cases()
  • Join the observations to the forecasts
    • fcst <- join_to_fcst(fcst, obs)
    • Matches time and location
    • Checks for same units
  • Representativeness error check
    • check_obs_against_fcst()

Verify

  • ens_verify(fcst, obs_col)
  • det_verify(fcst, obs_col)
  • Add thresholds to get threshold passing scores
  • Set comparator for direction of threshold passing
    • Can also use 2 thresholds to define classes
  • Define groupings for how to aggregate scores
    • Defaults to lead time

Verification aggregation by groups

  • Time groups should generally be in every group
    • Use make_verif_groups() for easier definition
    • Can also be used to set (e.g.) p for upper air
  • Use join_multi_groups() to prepare data for (e.g.) station groups
    • Takes care of cases where stations are in multiple groups

Example - read forecast

fcst <- read_point_forecast(
  seq_dttm(2025100100, 2025100718, "6h"),
  c("MEPS", "IFS_hires"),
  "T2m",
  file_path = "data/point_data/FCTABLE",
)

fcst <- common_cases(fcst)

Example - read observations

obs <- read_point_obs(
  unique_valid_dttm(fcst), 
  "T2m",
  stations  = unique_stations(fcst),
  file_path = "data/point_data/OBSTABLE" 
)

Example - scale to degrees C

fcst <- scale_param(fcst, -273.15, "degC")

obs <- scale_param(obs, -273.15, "degC", col = T2m)

Example - join & representativeness errors

fcst <- join_to_fcst(fcst, obs)

fcst <- check_obs_against_fcst(fcst, T2m)

Example - verify

verif <- det_verify(
  fcst, 
  T2m,
  thresholds = seq(-25, 10, 5)
)

Plot score

  • use plot_point_verif()
plot_point_verif(verif, bias_rmse)

Plot threshold score

plot_point_verif(
  verif, 
  equitable_threat_score, 
  facet_by = vars(threshold)
)

Plot using score_lookup()

fb <- score_lookup("fb")
plot_point_verif(
  verif, 
  {{fb}},
  filter_by = vars(threshold == "ge_10")
)

Use the visualization app

  • First save the data
save_point_verif(verif, "data/verification/basic")
  • Then run the app
shiny_plot_point_verif(
  file.path(getwd(), "data/verification"), 
  FALSE, "light"
)

More complicated groupings

my_groups <- make_verif_groups(
  time_groups = c("lead_time", "valid_dttm", "valid_hour"),
  groups = c("fcst_cycle", "station_group")
)

fcst <- expand_date(fcst, valid_dttm)
fcst <- join_multi_groups(fcst, station_groups)

More complicated groupings

verif <- det_verify(
  fcst, 
  T2m, 
  groupings = my_groups
)
  • More difficult to plot; better to use the app
save_point_verif(verif, "data/verification/grouped")

And now the quick way

Everything in (almost) one function

params <- c(
  make_verif_param(
    "2m temp",
    fcst_param       = "T2m",
    fcst_scaling     = make_scaling(-273.15, "degC"),
    obs_param        = "T2m",
    obs_scaling      = make_scaling(-273.15, "degC"),
    verif_comparator = c("ge", "between"),
    verif_thresholds = list(seq(-20, 10, 5), c(-Inf, seq(-20, 10, 5), Inf))
  ),
  make_verif_param(
    "2m spec hum",
    fcst_param       = "Q2m",
    fcst_scaling     = make_scaling(1000, "g/kg", TRUE),
    obs_param        = "Q2m",
    obs_scaling      = make_scaling(1000, "g/kg", TRUE),
    verif_thresholds = seq(10, 16, 2)
  ),
  make_verif_param(
    "10m wind speed",
    fcst_param       = "S10m",
    obs_param        = "S10m",
    verif_comparator = c("ge", "between"),
    verif_thresholds = list(seq(2, 10, 2), c(0, seq(2, 10, 2), Inf))
  )
)

Everything in (almost) one function

run_point_verif(
  seq_dttm(2025100100, 2025100718, "6h"),
  c("MEPS", "IFS_hires"),
  params,
  main_fcst_model      = "MEPS",
  dttm_rounding        = "1d",
  dttm_rounding_offset = "12h",
  fcst_path            = "data/point_data/FCTABLE",
  obs_path             = "data/point_data/OBSTABLE",
  out_path             = "data/verification/quick",
  defaults             = make_verif_defaults(
    verif_groups = make_verif_groups(
      c("lead_time", "valid_dttm", "valid_hour"),
      "fcst_cycle"
    )
  ) 
)
# A tibble: 2 × 5
    SID valid_dttm            T2m dist_to_fcst tolerance
  <int> <dttm>              <dbl>        <dbl>     <dbl>
1 22003 2025-10-01 12:00:00 -40.3         50.5      7.49
2 27106 2025-10-01 06:00:00  21.6         18.8     16.2 
# A tibble: 67 × 5
     SID valid_dttm           S10m dist_to_fcst tolerance
   <int> <dttm>              <dbl>        <dbl>     <dbl>
 1  1018 2025-10-02 09:00:00  17.3         9.70      9.30
 2  1064 2025-10-03 21:00:00  17.7        10.2      10.1 
 3  1064 2025-10-04 03:00:00  18.7        10.3      10.1 
 4  2705 2025-10-01 06:00:00   8.7         5.01      4.79
 5  2705 2025-10-01 18:00:00  11           7.38      6.62
 6  2705 2025-10-02 12:00:00  12.3         6.60      5.91
 7  2705 2025-10-02 15:00:00  12.6         7.34      5.91
 8  2705 2025-10-02 18:00:00  11.9         6.74      6.62
 9  2705 2025-10-03 03:00:00  10.6         5.75      5.06
10  2705 2025-10-03 12:00:00  13.9         7.70      5.91
11  2705 2025-10-03 15:00:00  12.3         6.77      5.91
12  2705 2025-10-03 18:00:00  13.6         7.77      6.62
13  2705 2025-10-04 00:00:00  10.7         5.52      5.06
14  2705 2025-10-04 03:00:00  10.1         5.60      5.06
15  2705 2025-10-04 06:00:00  10.1         5.49      4.79
16  2705 2025-10-04 09:00:00  12.8         7.45      4.79
17  2705 2025-10-04 15:00:00   9.7         5.95      5.91
18  2705 2025-10-04 18:00:00  11.9         6.97      6.62
19  2705 2025-10-05 00:00:00  10           6.27      5.06
20  2705 2025-10-05 03:00:00  10.5         6.68      5.06
21  2705 2025-10-05 06:00:00  12.3         7.58      4.79
22  2705 2025-10-05 12:00:00  12           6.20      5.91
23  2705 2025-10-05 15:00:00  12.2         7.03      5.91
24  2705 2025-10-05 18:00:00  16.2        11.4       6.62
25  2705 2025-10-05 21:00:00  13           8.26      6.62
26  2705 2025-10-06 00:00:00  12.2         8.11      5.06
27  2705 2025-10-06 03:00:00  11.5         7.54      5.06
28  2705 2025-10-06 06:00:00  11           6.94      4.79
29  2705 2025-10-06 09:00:00  11.2         7.58      4.79
30  2705 2025-10-07 00:00:00   7.4         5.07      5.06
31  2705 2025-10-07 06:00:00   9.7         5.67      4.79
32  2705 2025-10-07 09:00:00   9.9         5.30      4.79
33  2705 2025-10-07 21:00:00  12           6.75      6.62
34  2705 2025-10-08 00:00:00  11.9         6.37      5.06
35  2705 2025-10-08 03:00:00  10.4         5.53      5.06
36  2705 2025-10-08 06:00:00   9.3         4.80      4.79
37  2705 2025-10-08 15:00:00   9.5         6.53      5.91
38  2705 2025-10-08 21:00:00  10.9         6.72      6.62
39  2705 2025-10-09 00:00:00  12.3         8.13      5.06
40  2705 2025-10-09 03:00:00  12.6         8.01      5.06
41  2705 2025-10-09 06:00:00  12.5         7.30      4.79
42  2705 2025-10-09 09:00:00  12.8         6.95      4.79
43  2705 2025-10-09 15:00:00  11.5         6.37      5.91
44  2705 2025-10-09 18:00:00  13.6         9.70      6.62
45  2820 2025-10-05 18:00:00  16.4         9.91      9.53
46  2821 2025-10-05 21:00:00  12           6.88      6.74
47  2868 2025-10-03 00:00:00   8.7         4.82      4.60
48  2868 2025-10-04 00:00:00   9           5.16      4.60
49  2868 2025-10-04 03:00:00   9.1         5.75      4.60
50  2868 2025-10-05 06:00:00  11.7         5.72      4.58
51  2868 2025-10-05 09:00:00  12           5.84      4.58
52  2868 2025-10-05 12:00:00  12.1         5.87      5.32
53  2868 2025-10-05 15:00:00  10.4         5.38      5.32
54  2868 2025-10-05 18:00:00  12.2         6.48      6.01
55  2868 2025-10-05 21:00:00  10.9         6.03      6.01
56  2868 2025-10-06 00:00:00  11.5         6.67      4.60
57  2868 2025-10-06 03:00:00   9.6         4.76      4.60
58  2868 2025-10-06 06:00:00   8.8         4.70      4.58
59  2868 2025-10-06 09:00:00   9           5.02      4.58
60  2868 2025-10-08 00:00:00   9.3         4.77      4.60
61  6034 2025-10-06 03:00:00  29.8        24.1       3.76
62 10306 2025-10-05 18:00:00  10.8         4.51      2.77
63 10453 2025-10-04 15:00:00  19.9        12.2      11.0 
64 10453 2025-10-04 21:00:00  24.9        15.8      13.2 
65 22003 2025-10-01 12:00:00  99          93.9      12.5 
66 27113 2025-10-07 09:00:00  60          58.4       5.81
67 27113 2025-10-07 15:00:00  60          58.1       5.32
$`2m temp`
::det_summary_scores:: # A tibble: 340 × 15
   fcst_model lead_time num_stations num_cases   bias  rmse   mae  stde
   <chr>      <chr>            <int>     <int>  <dbl> <dbl> <dbl> <dbl>
 1 MEPS       0h                1044     28896 -0.140 0.973 0.710 0.963
 2 MEPS       12h               1044     28887 -0.248 1.29  0.947 1.27 
 3 MEPS       15h               1041     28649 -0.290 1.29  0.936 1.25 
 4 MEPS       18h               1044     28893 -0.311 1.34  0.991 1.31 
 5 MEPS       21h               1041     28648 -0.342 1.31  0.963 1.27 
 6 MEPS       24h               1044     28891 -0.352 1.36  1.01  1.32 
 7 MEPS       27h               1041     28650 -0.363 1.33  0.978 1.28 
 8 MEPS       30h               1044     28891 -0.362 1.38  1.03  1.33 
 9 MEPS       33h               1041     28652 -0.383 1.34  0.997 1.29 
10 MEPS       36h               1043     28892 -0.385 1.41  1.05  1.36 
# ℹ 330 more rows
# ℹ 7 more variables: mean_fcst <dbl>, mean_obs <dbl>, valid_dttm <chr>,
#   valid_hour <chr>, fcst_cycle <chr>, hexbin <list>, parameter <chr>

::det_threshold_scores:: # A tibble: 3,400 × 50
   fcst_model threshold lead_time num_stations num_cases num_cases_for_thresho…¹
   <chr>      <chr>     <chr>            <int>     <int>                   <dbl>
 1 MEPS       ge_-10    0h                1044     28896                   28896
 2 MEPS       ge_-10    12h               1044     28887                   28887
 3 MEPS       ge_-10    15h               1041     28649                   28649
 4 MEPS       ge_-10    18h               1044     28893                   28893
 5 MEPS       ge_-10    21h               1041     28648                   28648
 6 MEPS       ge_-10    24h               1044     28891                   28891
 7 MEPS       ge_-10    27h               1041     28650                   28650
 8 MEPS       ge_-10    30h               1044     28891                   28891
 9 MEPS       ge_-10    33h               1041     28652                   28652
10 MEPS       ge_-10    36h               1043     28892                   28892
# ℹ 3,390 more rows
# ℹ abbreviated name: ¹​num_cases_for_threshold_total
# ℹ 44 more variables: num_cases_for_threshold_observed <dbl>,
#   num_cases_for_threshold_forecast <dbl>, hits <dbl>, false_alarms <dbl>,
#   misses <dbl>, correct_rejections <dbl>, threat_score <dbl>, hit_rate <dbl>,
#   miss_rate <dbl>, false_alarm_rate <dbl>, false_alarm_ratio <dbl>,
#   heidke_skill_score <dbl>, pierce_skill_score <dbl>, …

--harp verification for 2m temp--
 # for forecasts from 00:00 UTC 01 Oct 2025 to 18:00 UTC 07 Oct 2025
 # using 1045 observation stations
 # for verification groups: 
    -> lead_time
    -> valid_dttm
    -> valid_hour
    -> lead_time & fcst_cycle
    -> valid_dttm & fcst_cycle
    -> valid_hour & fcst_cycle

$`2m spec hum`
::det_summary_scores:: # A tibble: 340 × 15
   fcst_model lead_time num_stations num_cases    bias  rmse   mae  stde
   <chr>      <chr>            <int>     <int>   <dbl> <dbl> <dbl> <dbl>
 1 MEPS       0h                 842     23218 -0.0623 0.454 0.343 0.450
 2 MEPS       12h                842     23192 -0.0758 0.498 0.379 0.492
 3 MEPS       15h                842     23065 -0.0975 0.511 0.388 0.501
 4 MEPS       18h                842     23197 -0.0956 0.516 0.393 0.507
 5 MEPS       21h                842     23065 -0.111  0.522 0.399 0.510
 6 MEPS       24h                842     23199 -0.109  0.528 0.404 0.517
 7 MEPS       27h                842     23067 -0.122  0.533 0.409 0.519
 8 MEPS       30h                842     23200 -0.114  0.544 0.416 0.531
 9 MEPS       33h                842     23067 -0.127  0.553 0.425 0.538
10 MEPS       36h                841     23199 -0.122  0.559 0.428 0.546
# ℹ 330 more rows
# ℹ 7 more variables: mean_fcst <dbl>, mean_obs <dbl>, valid_dttm <chr>,
#   valid_hour <chr>, fcst_cycle <chr>, hexbin <list>, parameter <chr>

::det_threshold_scores:: # A tibble: 680 × 50
   fcst_model threshold lead_time num_stations num_cases num_cases_for_thresho…¹
   <chr>      <chr>     <chr>            <int>     <int>                   <dbl>
 1 MEPS       ge_10     0h                 842     23218                     140
 2 MEPS       ge_10     12h                842     23192                     140
 3 MEPS       ge_10     15h                842     23065                     150
 4 MEPS       ge_10     18h                842     23197                     143
 5 MEPS       ge_10     21h                842     23065                     156
 6 MEPS       ge_10     24h                842     23199                     142
 7 MEPS       ge_10     27h                842     23067                     150
 8 MEPS       ge_10     30h                842     23200                     158
 9 MEPS       ge_10     33h                842     23067                     152
10 MEPS       ge_10     36h                841     23199                     149
# ℹ 670 more rows
# ℹ abbreviated name: ¹​num_cases_for_threshold_total
# ℹ 44 more variables: num_cases_for_threshold_observed <dbl>,
#   num_cases_for_threshold_forecast <dbl>, hits <dbl>, false_alarms <dbl>,
#   misses <dbl>, correct_rejections <dbl>, threat_score <dbl>, hit_rate <dbl>,
#   miss_rate <dbl>, false_alarm_rate <dbl>, false_alarm_ratio <dbl>,
#   heidke_skill_score <dbl>, pierce_skill_score <dbl>, …

--harp verification for 2m spec hum--
 # for forecasts from 00:00 UTC 01 Oct 2025 to 18:00 UTC 07 Oct 2025
 # using 842 observation stations
 # for verification groups: 
    -> lead_time
    -> valid_dttm
    -> valid_hour
    -> lead_time & fcst_cycle
    -> valid_dttm & fcst_cycle
    -> valid_hour & fcst_cycle

$`10m wind speed`
::det_summary_scores:: # A tibble: 340 × 15
   fcst_model lead_time num_stations num_cases  bias  rmse   mae  stde mean_fcst
   <chr>      <chr>            <int>     <int> <dbl> <dbl> <dbl> <dbl>     <dbl>
 1 MEPS       0h                 996     27585 0.349  1.76  1.25  1.73      4.57
 2 MEPS       12h                996     27582 0.339  1.77  1.25  1.74      4.64
 3 MEPS       15h                995     27412 0.326  1.80  1.27  1.77      4.68
 4 MEPS       18h                996     27587 0.333  1.80  1.28  1.77      4.66
 5 MEPS       21h                995     27409 0.313  1.82  1.28  1.80      4.69
 6 MEPS       24h                996     27588 0.325  1.83  1.29  1.80      4.67
 7 MEPS       27h                995     27413 0.314  1.83  1.29  1.80      4.70
 8 MEPS       30h                996     27588 0.320  1.83  1.29  1.81      4.68
 9 MEPS       33h                995     27418 0.300  1.86  1.31  1.83      4.70
10 MEPS       36h                996     27587 0.301  1.86  1.32  1.84      4.68
# ℹ 330 more rows
# ℹ 6 more variables: mean_obs <dbl>, valid_dttm <chr>, valid_hour <chr>,
#   fcst_cycle <chr>, hexbin <list>, parameter <chr>

::det_threshold_scores:: # A tibble: 3,740 × 50
   fcst_model threshold lead_time num_stations num_cases num_cases_for_thresho…¹
   <chr>      <chr>     <chr>            <int>     <int>                   <dbl>
 1 MEPS       ge_2      0h                 996     27585                   22188
 2 MEPS       ge_2      12h                996     27582                   22440
 3 MEPS       ge_2      15h                995     27412                   22530
 4 MEPS       ge_2      18h                996     27587                   22463
 5 MEPS       ge_2      21h                995     27409                   22566
 6 MEPS       ge_2      24h                996     27588                   22547
 7 MEPS       ge_2      27h                995     27413                   22625
 8 MEPS       ge_2      30h                996     27588                   22621
 9 MEPS       ge_2      33h                995     27418                   22757
10 MEPS       ge_2      36h                996     27587                   22739
# ℹ 3,730 more rows
# ℹ abbreviated name: ¹​num_cases_for_threshold_total
# ℹ 44 more variables: num_cases_for_threshold_observed <dbl>,
#   num_cases_for_threshold_forecast <dbl>, hits <dbl>, false_alarms <dbl>,
#   misses <dbl>, correct_rejections <dbl>, threat_score <dbl>, hit_rate <dbl>,
#   miss_rate <dbl>, false_alarm_rate <dbl>, false_alarm_ratio <dbl>,
#   heidke_skill_score <dbl>, pierce_skill_score <dbl>, …

--harp verification for 10m wind speed--
 # for forecasts from 00:00 UTC 01 Oct 2025 to 18:00 UTC 07 Oct 2025
 # using 998 observation stations
 # for verification groups: 
    -> lead_time
    -> valid_dttm
    -> valid_hour
    -> lead_time & fcst_cycle
    -> valid_dttm & fcst_cycle
    -> valid_hour & fcst_cycle

An even easier way

new_point_verif_project("<dir>")

Creates:

  • a params file
  • a config file
  • a script to be run with Rscript

There’s a lot more!